Table of Contents

Data Description

Data source: https://www.kaggle.com/harlfoxem/housesalesprediction

Imports

Load the Data

Features

Correlations

Pandas Profiling

Train-Test Split

Target Distribution in Train and Test

Modelling: Random Forest

Yellowbrick Visualization

Prediction Error vs Truth

Random Forest Confidence Interval

References:

Model Explanation Using Lime

Model Intrepretation using ELI5

Feature Importances

ELI5's Permutation Importance on the same features

Feature importance as a box plot

Weights of a tree in a small forest

sklearn Random Forest plot tree using graphviz

Decision Tree visualization using dtreeviz

Ref: https://github.com/parrt/dtreeviz